CooP - an organic, free range artisanal heritage chicken farm¶


Providing improved packaging with lineage information for each individual bird.

Data Schema¶



The Data Schema Represents chicken and egg family trees

Chickens have:

  • Chicken ID (C_ID)
  • Name (Name)
  • Sex (Rooster or Hen) (Sex)
  • Feather color (Color)
  • Favorite song (Favorite_Song)
  • Each Hen lays eggs
  • Generation ID (G_ID) int

Eggs have:

  • An identification number (E_ID) will end up mapping to C_ID
  • Location in the incubation hall where hens sit on their eggs. (IH_LOC)
  • Whether that spot is near a window (as 1/3rd of the spots should be) (NW - Boolean [True, False])
  • Parent IDs (PH_ID, PR_ID)(Will need to map to C_IDs)

Chicken Genealogy:

  • The egg that it came from (E_ID)
  • Its parents (PH_ID) - (PR_ID), and their eggs - (PHE_ID, PRE_ID)
  • The Grandparents, etc) - HGPH_ID, HGPR_ID, RGPH_ID, RGPR_ID
  • any additional columns to the chicken and egg tables or any other way needed.

Note: You will probably need more columns than just the above minimum information.

Generate data¶


Two weeks after your starting, all records were destroyed after a ransom malware attack scrambled the database filesystem. Despite the farmworkers trying to remember all the chicken's names, it's impossible to tell them apart now.

We need to recreate, (generate fake data) about all chickens currently on the farm (1000 chickens).
Some of whom are parents to others.

Generate the required 1000 records

What can you do to make these records seem as realistic as possible? (have realistic timelines and age) (Feel free to look up data as you need to, but tell us what you looked up?)

  • Bonus: How could a government official check whether the dataset is faked or not? (most chicken species have documented egg rates and age before producing egges)
  • Bonus Bonus: What can you do to cover up these checks? (use the published ranges with randomization over actual calendar days to make data more realistic)
  • Bonus Bonus Bonus: What can a government official check to see whether you're covering up their checks. Etc

Name tags for Guided Tours¶


We Give guided tours of the farm and introduce all the chickens to visitors. To make this possible we print tags and attached to each chicken's leg.

The Tag includes:

  • The Chicken's name

  • Their Favorite song

  • Their Parents

  • Their Grandparents

  • The Location each parent and grand parent was incubated

  • A randomly selected first cousin of the chicken

    https://github.com/aruljohn/popular-baby-names/blob/master/2000/boy_names_2000.csv
    https://github.com/fivethirtyeight/data/blob/master/classic-rock/classic-rock-song-list.csv

Bonus¶


Create a dashboard in Metabase that shows some KPIs for this chicken farm. Please include either a public link or a screenshot.

Assumptions¶

Start Date¶

On March 11, 2020, the World Health Organization (WHO) declared COVID-19, the disease caused by the SARS-CoV-2, a pandemic. The announcement followed a rising sense of alarm in the preceding months over a new, potentially lethal virus that was swiftly spreading around the world.

Hen to Rooster Ratio¶

https://www.thehappychickencoop.com/whats-the-perfect-ratio-of-hens-to-roosters/

List of Chicken Breeds¶

https://www.typesofchicken.com/best-chickens-for-texas-humidity/

https://starmilling.com/poultry-chicken-breeds/#:~:text=There%20are%209%20recognized%20colors,tailed%20Buff%2C%20White%20and%20Columbian.

Breeds for Texas Best-egg-laying breed in Texas that doesn’t have the issues with humidity, heat, and fertility yet still lays pretty well, we suggest the Mediterranean sorts:

  • Ancona
  • Catalana
  • Egyptian Fayoumi
  • Leghorn
  • Hamburg

Ancona Details for Texas¶

https://www.typesofchicken.com/keeping-ancona-chickens/

Hatching Rates¶

https://rosehillfarm.ca/2020/04/25/incubating-chickeneggs/#:~:text=Eggs%20typically%20hatch%20at%20a,just%20a%20law%20of%20averages

In [1]:
import pandas as pd
import datetime
from datetime import date
from calendar import Calendar, monthrange
from dateutil.rrule import rrule, DAILY
import uuid
import numpy as np
np.random.seed(1)
import random

#We will use duckdb for Metabase analyztics and visualization
import duckdb
# Import jupysql Jupyter extension to create SQL cells
%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

import warnings
warnings.filterwarnings('ignore')

import plotly.tools as tls
import plotly.express as px
import plotly.io as pio
In [2]:
## Paramanters

start_date = '03/11/2020'
start_hen_count =45
start_rooster_count = 1

breed = 'Anacona' # Anacona is heat tolerant and good for texas
color = 'Black&White'   #They are Black with white specs
expected_egg_rate_min = 180
expected_egg_rate_max = 220

min_egg_producing_age = '20 weeks' #5 months
anacona_lifespan_min = '8 years'
anacona_lifespan_max = '12 years'


#Daily Log is the Audit Log of every day since the Farm began operation.
#Log can be used to track Month, High&Lo Temps, Humidity, Number of Sunlight Hours, 
# Number of Eggs collected, Number of Eggs hatched, total number of chickens etc.

#Create initial daily log Pandas Dataframe
dailylog_df = pd.DataFrame()

#Initial Farm Operation Investment (4 hens & 1 rooster)
int_hen_list = [['Mary', 'Hen'], ['Pat', 'Hen'], ['Barb', 'Hen'], ['Liz', 'Hen'] ]
int_rooster_list = [['Ralph', 'Rooster']]

#Location
location = [['A', 'False'],['B', 'True'],['C', 'False']]
location_df = pd.DataFrame(location, columns = ['IH_Location', 'Near_Window'])

#Time delay from purchase to production as we purchased young chicks
delay_start = '1 Month or 5 weeks'
egg_production_start = '04/15/2020'

#Create Empyt Eggs Dataframe from list
egg_columns = ['E_ID', 'PH_ID', 'PR_ID', 'IH_LOC', 'NW' ]
egg_df = pd.DataFrame(columns=['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'IH_LOC', 'NW' ])

#Songs Table - additional attributes can be uese to plot and analyze
songs_df = pd.read_csv('Coop/songlist.csv')

hens_names = pd.read_csv('Coop/census-female-names.csv', usecols=[0], header=None)
rooster_names = pd.read_csv('Coop/boy_names_2000.csv', usecols=[1], header=None)

sex_list = ['Rooster', 'Hen']
rooster_names_list = rooster_names[1].tolist()
hen_names_list = hens_names[0].tolist()
In [3]:
#Convert Near_Window to Boolean Values
location_df['Near_Window'] = location_df['Near_Window'].map({'False':False, 'True':True})
In [4]:
#Egg Production Rates
egg_rate_min = expected_egg_rate_min / 365
egg_rate_max = expected_egg_rate_max / 365
print("min_rate/day = ", egg_rate_min, "\nmax_rate/day = ", egg_rate_max)
min_rate/day =  0.4931506849315068 
max_rate/day =  0.6027397260273972
In [5]:
#Create initial Chicken Dataframe from investment:

chicken_list = int_hen_list + int_rooster_list

df = pd.DataFrame (chicken_list, columns = ['Name', 'Sex'])
df['Color'] = 'Black&White'
#df
In [6]:
#Egg Rate Per Chicken (Rooster = 0)
#egg_rate_min egg_rate_max

#Randomly Assign Egg Rate to Hens in Base Family
rate = np.random.randint(expected_egg_rate_min, expected_egg_rate_max, size=4)
df['egg_rate_per_year'] = pd.DataFrame(rate, columns=['egg_rate'])
In [ ]:
 

Initial Flock¶

The initial investment, Flock Size consisted of 4 Hens and 1 Rooster

In [7]:
df.head(5)
Out[7]:
Name Sex Color egg_rate_per_year
0 Mary Hen Black&White 217.0
1 Pat Hen Black&White 192.0
2 Barb Hen Black&White 188.0
3 Liz Hen Black&White 189.0
4 Ralph Rooster Black&White NaN

Generate Unique IDs for each bird in the Initial Flock¶

In [8]:
#Map initial Family to unique Chicken_IDs
#generate the number of IDs that are needed to be applied to each bird with a unique ID
names = df['Name'].tolist()

# generte Unique ids
ids = np.random.randint(low=1e6, high=1e9, size = len(names))

# maps ids to names
maps = {k:v for k,v in zip(names, ids)}

# add new id column
df['C_ID'] = df[['Name']].agg(' '.join, 1).map(maps)

#Add Generation to Table - Initial_Flock, Gen1, Gen2, Gen3, Gen4
#The Generation column will help us track the operation growth and family, etc.
#Assign Initial_Flock to our initila flock birds
df['Generation'] = "Initial_Flock"
In [9]:
#Uncomment to show Birds with Names and IDs
df
Out[9]:
Name Sex Color egg_rate_per_year C_ID Generation
0 Mary Hen Black&White 217.0 225766667 Initial_Flock
1 Pat Hen Black&White 192.0 631311759 Initial_Flock
2 Barb Hen Black&White 188.0 397591248 Initial_Flock
3 Liz Hen Black&White 189.0 630559425 Initial_Flock
4 Ralph Rooster Black&White NaN 800981516 Initial_Flock

Generating egg generation rates and totals eggs of initial Flock¶

In [10]:
#Calculate end-date using today's date
end_date = date.today()
today = date.today()
In [11]:
# initializing the start and end date
start_date = date(2020, 4, 15)
end_date = date(2020, 3, 31)
In [12]:
# iterating over the dates
#for d in rrule(DAILY, dtstart=start_date, until=end_date):
 #   print(d.strftime("%Y-%m-%d"))

Use pandas to Iterate through a range of dates
use the Pandas date_range() function method. It returns a fixed frequency DatetimeIndex. Syntax: pandas.date_range(start, end)

Parameter:
start is the starting date
end is the ending date

Generate a Daily Farm Log that can be used to track daily operations¶

In [13]:
# specify the start date is 2021 jan 1 st
# specify the end date is 2021 feb 1 st
dailylog_df['Dates']= pd.date_range(start='04/15/2020', end=today)
In [14]:
df['diff_years'] = (dailylog_df.iloc[-1]['Dates'] - dailylog_df.iloc[0]['Dates']) / np.timedelta64(1, 'Y')
df['egg_count'] = df['egg_rate_per_year'] * df['diff_years']
In [15]:
print("The total number of Farm Operations Days to date:", dailylog_df.shape[0])
The total number of Farm Operations Days to date: 1050

This provides an expectation on the number of total eggs from the Initial Flock since the start of Farm Operations

In [16]:
df.head()
Out[16]:
Name Sex Color egg_rate_per_year C_ID Generation diff_years egg_count
0 Mary Hen Black&White 217.0 225766667 Initial_Flock 2.872064 623.237986
1 Pat Hen Black&White 192.0 631311759 Initial_Flock 2.872064 551.436374
2 Barb Hen Black&White 188.0 397591248 Initial_Flock 2.872064 539.948117
3 Liz Hen Black&White 189.0 630559425 Initial_Flock 2.872064 542.820181
4 Ralph Rooster Black&White NaN 800981516 Initial_Flock 2.872064 NaN
In [17]:
#Round Egg Count down as egg counts have to be a whole number, Int.
df['egg_count_round'] = df['egg_count'].apply(np.floor)

#Create a seperate Hens and Rooster Table for reporting and Management.
hens_df = df[df['Sex'].str.match('Hen')]
roosters_df = df[df['Sex'].str.match('Rooster')]

#Convert the egg_count from Floats to Integers
hens_df['egg_count_round'] = hens_df['egg_count_round'].astype(int)
In [18]:
#Generate the toal Egg expectancy from the Initial Flock which will be used to generate the Eggs Table.
print("We expect the initial Flock has produced the following number of Eggs by this time:")
hens_df['egg_count_round'].sum()
We expect the initial Flock has produced the following number of Eggs by this time:
Out[18]:
2255
In [19]:
#Uncommend to show the current Hens Table.
#hens_df
In [20]:
print("The current Egg Table contains the following Columns:")
egg_df.columns
The current Egg Table contains the following Columns:
Out[20]:
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'IH_LOC', 'NW'], dtype='object')
In [21]:
print("The current Location Table contains the following Columns:")
location_df.columns
The current Location Table contains the following Columns:
Out[21]:
Index(['IH_Location', 'Near_Window'], dtype='object')
In [22]:
#Uncomment to show the current Rooster Table:
#roosters_df

Generate First Generation Eggs¶

In [23]:
print("Looping through each Hen and their associated egg counts per Hen to generate the Egg Table")
#eggs_df = pd.DataFrame()

#convert location table to a list to radomonly assign egg location - egg can only have 1 of three locations based on assumptions.
location_list = location_df['IH_Location'].tolist()
number_of_samples = 1
NW = 'False'

#Convert Rooster table to List to radomon aassign Rooster to Egg:
rooster_list = roosters_df['C_ID'].tolist()

for index, row in hens_df.iterrows():
    eggs_per_hen = row['egg_count_round']
    hen = row['Name']
    PH_ID = row['C_ID']
    #print(row['egg_count_round'])
    print( row['Name'], eggs_per_hen)
    
    for i in range(eggs_per_hen):
        #Create a new random Egg ID - E_ID
        # generte random integer ids
        # generate egg id
        E_ID = np.random.randint(low=1e3, high=1e9)
        #generate Parent Rooster ID - PR_ID
        PR_ID = random.choices(population=rooster_list, k=number_of_samples)
        
        #Generate Location and assign to each egg
        IH_LOC = random.choices(population=location_list, k=number_of_samples)
                        
        #Append each egg to the Egg Table:
        egg_df = egg_df.append(pd.DataFrame({'E_ID': E_ID, 'PH_Name': hen, 'PH_ID': PH_ID, 'PR_ID': PR_ID, 'IH_LOC': IH_LOC, 'NW': NW}, index=[0]), ignore_index=True)
Looping through each Hen and their associated egg counts per Hen to generate the Egg Table
Mary 623
Pat 551
Barb 539
Liz 542
In [24]:
#Add Parent_Gen Column to the Egg Table: The initial entry will be Initial_Flock, Gen1, Gen2, Gen3, Gen4
#The Parent_Gen column will help track the operation growth and family, etc.
#Assign Initial_Flock to our initila flock birds
egg_df['Parent_Gen'] = "Initial_Flock"
In [25]:
print("The Inital Flock has generated", egg_df.shape[0] ,"total eggs and added to the Egg Table")
The Inital Flock has generated 2255 total eggs and added to the Egg Table
In [26]:
print("The total number of unique Egg IDs:", egg_df['E_ID'].nunique())
The total number of unique Egg IDs: 2255
In [27]:
egg_df.head(2)
Out[27]:
E_ID PH_Name PH_ID PR_ID IH_LOC NW Parent_Gen
0 592323119 Mary 225766667 800981516 C False Initial_Flock
1 410431189 Mary 225766667 800981516 C False Initial_Flock
In [28]:
#Check for duplicate IDs
egg_df[egg_df.duplicated(['E_ID'], keep=False)]
Out[28]:
E_ID PH_Name PH_ID PR_ID IH_LOC NW Parent_Gen
In [29]:
#Set Location values for Window Location based on IH_LOC == B
egg_df.loc[(egg_df['IH_LOC'] == 'B'), 'NW'] = True 
In [30]:
ev_df = egg_df['PH_Name'].value_counts().rename_axis('Hen').reset_index(name='egg_count')
ev_df
Out[30]:
Hen egg_count
0 Mary 623
1 Pat 551
2 Liz 542
3 Barb 539
In [31]:
#Plot Sensor Execution Counts per hour
fig = px.scatter(ev_df, x='Hen', y='egg_count', 
              template = 'none',
              color = 'egg_count',
              size = 'egg_count',
            title=f"Egg count per hen")
pio.write_html(fig,  f'totaleggs.html')
fig.show()
In [32]:
df.columns
Out[32]:
Index(['Name', 'Sex', 'Color', 'egg_rate_per_year', 'C_ID', 'Generation',
       'diff_years', 'egg_count', 'egg_count_round'],
      dtype='object')
In [33]:
# Initial Flock Chickens Table
chickens_df = df[['Name', 'Sex', 'Color', 'C_ID', 'Generation']]
#Assign NaN values to First Generation Chickens for E_ID, PH_Name, PH_ID, PR_ID

Assign Favorite Songs to Initial Flock Chickens¶

In [34]:
songs_df.head(5)
Out[34]:
Song Clean ARTIST CLEAN Release Year COMBINED First? Year? PlayCount F*G
0 Caught Up in You .38 Special 1982 Caught Up in You by .38 Special 1 1 82 82
1 Fantasy Girl .38 Special NaN Fantasy Girl by .38 Special 1 0 3 0
2 Hold On Loosely .38 Special 1981 Hold On Loosely by .38 Special 1 1 85 85
3 Rockin' Into the Night .38 Special 1980 Rockin' Into the Night by .38 Special 1 1 18 18
4 Art For Arts Sake 10cc 1975 Art For Arts Sake by 10cc 1 1 1 1
In [35]:
#Create Song list and assign to Chickens
Generation = "Initial_Flock"
song_list = songs_df['Song Clean'].to_list()
chickens_df["Favorite_Song"] = np.random.choice(song_list, size=len(chickens_df))
#chickens_df['Genration'] = Generation
In [36]:
chickens_df
Out[36]:
Name Sex Color C_ID Generation Favorite_Song
0 Mary Hen Black&White 225766667 Initial_Flock Paradise By The Dashboard Light
1 Pat Hen Black&White 631311759 Initial_Flock With a Little Luck
2 Barb Hen Black&White 397591248 Initial_Flock Walk On The Wild Side
3 Liz Hen Black&White 630559425 Initial_Flock Wild Horses
4 Ralph Rooster Black&White 800981516 Initial_Flock Pride of Man
In [37]:
#Check for duplicate IDs
chickens_df[chickens_df.duplicated(['C_ID'], keep=False)]
Out[37]:
Name Sex Color C_ID Generation Favorite_Song

Convert Initial Flock Eggs to Gen 1 Chickens¶

In [38]:
eggconversion_ratio = 0.25
In [39]:
hens_df
Out[39]:
Name Sex Color egg_rate_per_year C_ID Generation diff_years egg_count egg_count_round
0 Mary Hen Black&White 217.0 225766667 Initial_Flock 2.872064 623.237986 623
1 Pat Hen Black&White 192.0 631311759 Initial_Flock 2.872064 551.436374 551
2 Barb Hen Black&White 188.0 397591248 Initial_Flock 2.872064 539.948117 539
3 Liz Hen Black&White 189.0 630559425 Initial_Flock 2.872064 542.820181 542
In [40]:
#Calculate Selection Rate to hatch per Hen
first_window = 30 * 6 * 0.4
#time = 6 months or egg_rate_per_year / 2
In [41]:
first_gen_selection = hens_df[['Name', 'Sex', 'egg_rate_per_year', 'C_ID', 'Generation']]
first_gen_selection['Counts'] = (first_gen_selection['egg_rate_per_year'] / 2 * 0.27).apply(np.floor)
#df['egg_count_round'] = df['egg_count'].apply(np.floor)
In [42]:
first_gen_selection
Out[42]:
Name Sex egg_rate_per_year C_ID Generation Counts
0 Mary Hen 217.0 225766667 Initial_Flock 29.0
1 Pat Hen 192.0 631311759 Initial_Flock 25.0
2 Barb Hen 188.0 397591248 Initial_Flock 25.0
3 Liz Hen 189.0 630559425 Initial_Flock 25.0
In [43]:
#Select Eggs from Hens - 53% hens - 47% Roosters
In [44]:
first_gen_slicer = first_gen_selection[['Name', 'C_ID', 'Counts']]
first_gen_slicer['Counts'] = first_gen_slicer['Counts'].astype(int)
In [45]:
egg_df.columns
Out[45]:
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'IH_LOC', 'NW', 'Parent_Gen'], dtype='object')
In [46]:
chicken_first = pd.DataFrame()
for index, row in first_gen_slicer.iterrows():
    chicks_per_hen = row['Counts']
    hen = row['Name']
    PH_ID = row['C_ID']
    
    print( row['Name'], chicks_per_hen)
    
    # Creating a second dataframe that will be the subset of main dataframe
    #print("Second data frame")
    dataframe_first = egg_df[['E_ID', 'PH_Name', 'PH_ID', 'PR_ID']].sample(n=chicks_per_hen)
    chicken_first = chicken_first.append(dataframe_first)
    chicken_first['Generation'] = 'Gen1'
    chicken_first['Color'] = color
    chicken_first['C_ID'] = chicken_first['E_ID']
    
Mary 29
Pat 25
Barb 25
Liz 25
In [47]:
#chicken_first
In [48]:
#Assign Favorite Song to First Gen Chickens
chicken_first["Favorite_Song"] = np.random.choice(song_list, size=len(chicken_first))

#Assign Sex to First Gen Chickents
chicken_first['Sex'] = np.random.choice(sex_list, size=len(chicken_first))

first_gen_rooster_count = chicken_first[chicken_first['Sex'] == "Rooster"].shape[0]
first_gen_rooster = chicken_first[chicken_first['Sex'] == "Rooster"]

first_gen_hen_count = chicken_first[chicken_first['Sex'] == "Hen"].shape[0]
first_gen_hen = chicken_first[chicken_first['Sex'] == "Hen"]

print("The first Geration has produced:")
print("================================")
print(first_gen_rooster_count, "Roosters")
print(first_gen_hen_count, "Hens")

#Gerate the an appropriate nummber of Rooster Names for Fist Generation and assign
first_gen_rooster_names = random.sample(rooster_names_list, first_gen_rooster_count)
first_gen_rooster['Name'] = np.random.choice(rooster_names_list, size=len(first_gen_rooster))

#Generate the appropriate number/list of Hens Names for First Generation and assign
first_gen_hens_names = random.sample(hen_names_list, first_gen_hen_count)
first_gen_hen['Name'] = np.random.choice(hen_names_list, size=len(first_gen_hen))


first_gen_chickens = pd.concat([first_gen_hen, first_gen_rooster], axis=0)
The first Geration has produced:
================================
55 Roosters
49 Hens
In [49]:
first_gen_chickens.head(5)
Out[49]:
E_ID PH_Name PH_ID PR_ID Generation Color C_ID Favorite_Song Sex Name
1134 884986770 Pat 631311759 800981516 Gen1 Black&White 884986770 Have a Drink On Me Hen GWENN
850 744877899 Pat 631311759 800981516 Gen1 Black&White 744877899 And She Was Hen LOLITA
358 59922539 Mary 225766667 800981516 Gen1 Black&White 59922539 Night Prowler Hen SHAROLYN
1377 34072451 Barb 397591248 800981516 Gen1 Black&White 34072451 Hypnotized Hen RUFINA
799 997044851 Pat 631311759 800981516 Gen1 Black&White 997044851 Limelight Hen FERMINA

Combine Initial Flock and Gen 1 Tables for Chickens, Hens and Roosters¶

In [50]:
chickens_df.columns
chickens_df
Out[50]:
Name Sex Color C_ID Generation Favorite_Song
0 Mary Hen Black&White 225766667 Initial_Flock Paradise By The Dashboard Light
1 Pat Hen Black&White 631311759 Initial_Flock With a Little Luck
2 Barb Hen Black&White 397591248 Initial_Flock Walk On The Wild Side
3 Liz Hen Black&White 630559425 Initial_Flock Wild Horses
4 Ralph Rooster Black&White 800981516 Initial_Flock Pride of Man
In [51]:
#Check for duplicate IDs
chickens_df[chickens_df.duplicated(['C_ID'], keep=False)]
Out[51]:
Name Sex Color C_ID Generation Favorite_Song
In [52]:
#Reorder first_gen_chickens dataframe columns to match Initial Flock Table Format so we can Concatenate them
In [53]:
chickens_df.columns
Out[53]:
Index(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song'], dtype='object')
In [54]:
#Copy original chickens_df dataframe to initial_flock_chickens for record keeping
initial_flock_chickens = chickens_df
initial_flock_eggs = egg_df
In [55]:
first_gen_chickens.columns
Out[55]:
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'Generation', 'Color', 'C_ID',
       'Favorite_Song', 'Sex', 'Name'],
      dtype='object')
In [56]:
first_gen_chickens = first_gen_chickens[['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'E_ID',
       'PH_Name', 'PH_ID', 'PR_ID']]
In [57]:
#first_gen_chickens
In [58]:
#Combine Initial Flock Chickens and First Gen Chickens into Chickens Table
chickens_combined_df = pd.concat([chickens_df, first_gen_chickens], axis=0 )
In [59]:
chickens_combined_df.head(9)
Out[59]:
Name Sex Color C_ID Generation Favorite_Song E_ID PH_Name PH_ID PR_ID
0 Mary Hen Black&White 225766667 Initial_Flock Paradise By The Dashboard Light NaN NaN NaN NaN
1 Pat Hen Black&White 631311759 Initial_Flock With a Little Luck NaN NaN NaN NaN
2 Barb Hen Black&White 397591248 Initial_Flock Walk On The Wild Side NaN NaN NaN NaN
3 Liz Hen Black&White 630559425 Initial_Flock Wild Horses NaN NaN NaN NaN
4 Ralph Rooster Black&White 800981516 Initial_Flock Pride of Man NaN NaN NaN NaN
1134 GWENN Hen Black&White 884986770 Gen1 Have a Drink On Me 884986770 Pat 631311759 800981516
850 LOLITA Hen Black&White 744877899 Gen1 And She Was 744877899 Pat 631311759 800981516
358 SHAROLYN Hen Black&White 59922539 Gen1 Night Prowler 59922539 Mary 225766667 800981516
1377 RUFINA Hen Black&White 34072451 Gen1 Hypnotized 34072451 Barb 397591248 800981516
In [60]:
print("The initial Flock Hens generated", egg_df.shape[0], "eggs and", first_gen_chickens.shape[0], "chickens")
The initial Flock Hens generated 2255 eggs and 104 chickens
In [61]:
chickens_df['C_ID'].nunique()
Out[61]:
5
In [62]:
#Check for duplicate IDs
chickens_df[chickens_df.duplicated(['C_ID'], keep=False)]
Out[62]:
Name Sex Color C_ID Generation Favorite_Song
In [63]:
chickens_df.to_csv('Coop/gen_one_flock_df.csv')

Gen 1 Hens to Generate Eggs for Gen 2 Chicks¶

List and Count for Hens from Gen 1¶

In [64]:
#Assign Egg Rates to Hens from Gen 1
#Randomly Assign Egg Rate to Hens in Base Family
rate1 = np.random.randint(expected_egg_rate_min, expected_egg_rate_max, size=first_gen_hen_count)

first_gen_hen['egg_rate_per_year'] = np.random.choice(rate1, size=len(first_gen_hen))
In [65]:
#first_gen_hen.head(5)
In [66]:
#Create initial eggloggen1_df daily log Pandas Dataframe

eggloggen1_df = pd.DataFrame()
eggloggen1_df['Dates']= pd.date_range(start='10/15/2020', end=today)

eggloggen1_df['diff_years'] = (eggloggen1_df.iloc[-1]['Dates'] - eggloggen1_df.iloc[0]['Dates']) / np.timedelta64(1, 'Y')
gen1_diff_years = (eggloggen1_df.iloc[-1]['Dates'] - eggloggen1_df.iloc[0]['Dates']) / np.timedelta64(1, 'Y')
first_gen_hen['egg_count'] = first_gen_hen['egg_rate_per_year'] * gen1_diff_years

#Round Egg Count down as egg counts have to be a whole number, Int.
first_gen_hen['egg_count_round'] = first_gen_hen['egg_count'].apply(np.floor)

#Convert the egg_count from Floats to Integers
first_gen_hen['egg_count_round'] = first_gen_hen['egg_count_round'].astype(int)
In [67]:
print("The total number of Farm Operations Days to date for Gen 1 Hens:", eggloggen1_df.shape[0])
The total number of Farm Operations Days to date for Gen 1 Hens: 867
In [68]:
#The roosters could be from the Original Flock (1) or the  Gen1 Roosters
roosters_df = chickens_combined_df[chickens_combined_df.Sex == 'Rooster']
In [69]:
#The Hens for laying Gen 2 eggs would only include the Gen 1 Hens - first_gen_hen
#first_gen_hen.head(10)
first_gen_hen.columns
Out[69]:
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'Generation', 'Color', 'C_ID',
       'Favorite_Song', 'Sex', 'Name', 'egg_rate_per_year', 'egg_count',
       'egg_count_round'],
      dtype='object')
In [70]:
print("The number of potentail Roosters for Gen 2 is:", roosters_df.shape[0])
The number of potentail Roosters for Gen 2 is: 56

Eggs¶

In [71]:
print("Looping through each Hen and their associated egg counts per Hen to generate the Egg Table")
#Second Generation of Eggs
eggs2_df = pd.DataFrame()

#Add Parent_Gen Column to the Egg Table: The is Gen1 generating Gen2 eggs
#The Parent_Gen column will help track the operation growth and family, etc.
#Assign Gen1 to the Gen 2 Egg routine
eggs2_df['Parent_Gen'] = "Gen1"

#convert location table to a list to radomonly assign egg location - egg can only have 1 of three locations based on assumptions.
location_list = location_df['IH_Location'].tolist()
number_of_samples = 1
NW = 'False'

#Convert Complete Rooster table to List to radomon aassign Rooster to Egg:
#Use Complete Rooster List
rooster_list = roosters_df['C_ID'].tolist()

#Iterate over each Gen1 Hen

for index, row in first_gen_hen.iterrows():
    eggs_per_hen = row['egg_count_round']
    hen = row['Name']
    PH_ID = row['C_ID']
    #print(row['egg_count_round'])
    print( row['Name'], eggs_per_hen)
    
    for i in range(eggs_per_hen):
        #Create a new random Egg ID - E_ID
        # generte random integer ids
        # generate egg id
        E_ID = np.random.randint(low=1e3, high=1e9)
        Parent_Gen = "Gen1"
        #generate Parent Rooster ID - PR_ID
        PR_ID = random.choices(population=rooster_list, k=number_of_samples)
        
        #Generate Location and assign to each egg
        IH_LOC = random.choices(population=location_list, k=number_of_samples)
                        
        #Append each egg to the Egg Table:
        eggs2_df = eggs2_df.append(pd.DataFrame({'E_ID': E_ID, 'PH_Name': hen, 'PH_ID': PH_ID, 'PR_ID': PR_ID, 'IH_LOC': IH_LOC, 'NW': NW, 'Parent_Gen': Parent_Gen }, index=[0]), ignore_index=True)
Looping through each Hen and their associated egg counts per Hen to generate the Egg Table
GWENN 502
LOLITA 488
SHAROLYN 495
RUFINA 481
FERMINA 486
DEANE 483
MALORIE 462
HELLEN 471
AURA 481
RANA 483
ELFREDA 433
ALINA 483
ALEXANDRA 502
JONELL 476
PENELOPE 455
ZORAIDA 495
BARBIE 471
GIANNA 488
SONG 474
SHARLENE 500
CLARINE 474
JOHANA 476
YANG 448
ROXANNA 467
AKILAH 493
HWA 488
BARBRA 481
ALAYNA 457
SALLIE 505
OCTAVIA 483
RUBIE 497
AI 441
KARMA 512
MARLEEN 495
TOWANDA 500
AKILAH 438
DELIA 441
STEPHANIE 502
JOSIE 481
ILIANA 459
BRANDON 490
CONSUELO 502
ETHYL 502
ELLY 502
KANISHA 457
CATARINA 476
GRETCHEN 488
MARYAM 476
JANETTE 445
In [72]:
eggs2_df.head(3)
Out[72]:
Parent_Gen E_ID PH_Name PH_ID PR_ID IH_LOC NW
0 Gen1 712701509.0 GWENN 884986770.0 879147425.0 B False
1 Gen1 571391458.0 GWENN 884986770.0 197873604.0 C False
2 Gen1 745110216.0 GWENN 884986770.0 167740021.0 B False
In [73]:
eggs2_df.shape[0]
Out[73]:
23485
In [74]:
#Check for duplicate IDs
eggs2_df[eggs2_df.duplicated(['E_ID'], keep=False)]
Out[74]:
Parent_Gen E_ID PH_Name PH_ID PR_ID IH_LOC NW
In [75]:
#Set Location values for Window Location based on IH_LOC == B
eggs2_df.loc[(eggs2_df['IH_LOC'] == 'B'), 'NW'] = True 
In [76]:
#eggs2_df.head(10)

Convert Gen 2 Eggs to Chickens and Combine¶

  • combine Total Eggs Table
  • combine Total Chicken Table
In [77]:
#Provide a smaller rate than the first generation
gen2_eggconversion_ratio = 0.43
In [78]:
second_gen_selection = first_gen_hen[['Name', 'Sex', 'egg_rate_per_year', 'C_ID', 'Generation']]
second_gen_selection['Counts'] = (second_gen_selection['egg_rate_per_year'] / 2 * 0.23).apply(np.floor)

second_gen_selection['Counts'] = second_gen_selection['Counts'].astype(int)
#df['egg_count_round'] = df['egg_count'].apply(np.floor)
print("This selection ratio should provide the farm with", second_gen_selection['Counts'].sum(), "Gen 2 Chickens")
This selection ratio should provide the farm with 1121 Gen 2 Chickens
In [79]:
#second_gen_selection
In [80]:
second_gen_slicer = second_gen_selection[['Name', 'C_ID', 'Counts']]
In [81]:
#second_gen_slicer
In [82]:
#Check for duplicate IDs
eggs2_df[eggs2_df.duplicated(['E_ID'], keep=False)]
Out[82]:
Parent_Gen E_ID PH_Name PH_ID PR_ID IH_LOC NW
In [83]:
chicken_second = pd.DataFrame()

for index, row in second_gen_slicer.iterrows():
    chicks_per_hen = row['Counts']
    hen = row['Name']
    PH_ID = row['C_ID']
    
    
    # Creating a second dataframe that will be the subset of main dataframe
    #print("Second data frame")
    dataframe_second = eggs2_df[['E_ID', 'PH_Name', 'PH_ID', 'PR_ID']].sample(n=chicks_per_hen, replace= False)
    
    #Line to fix loop - Convert Egg ID (E_ID) to Chicken ID before appending to second generation dataframe
    dataframe_second['C_ID'] =dataframe_second['E_ID']
    
    chicken_second = chicken_second.append(dataframe_second)
    chicken_second['Generation'] = 'Gen2'
    chicken_second['Color'] = color
    
    #chicken_second['C_ID'] = chicken_second['E_ID']
    
In [84]:
chicken_second.columns
Out[84]:
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'C_ID', 'Generation', 'Color'], dtype='object')
In [85]:
#Check for duplicate IDs
chicken_second[chicken_second.duplicated(['E_ID'], keep=False)].sort_values("E_ID")
Out[85]:
E_ID PH_Name PH_ID PR_ID C_ID Generation Color
1321 32940298.0 SHAROLYN 59922539.0 216432846.0 32940298.0 Gen2 Black&White
1321 32940298.0 SHAROLYN 59922539.0 216432846.0 32940298.0 Gen2 Black&White
3785 62652878.0 HELLEN 905132962.0 120625354.0 62652878.0 Gen2 Black&White
3785 62652878.0 HELLEN 905132962.0 120625354.0 62652878.0 Gen2 Black&White
6789 134242878.0 PENELOPE 525986311.0 34072451.0 134242878.0 Gen2 Black&White
6789 134242878.0 PENELOPE 525986311.0 34072451.0 134242878.0 Gen2 Black&White
20796 163863830.0 ELLY 986027613.0 472295211.0 163863830.0 Gen2 Black&White
20796 163863830.0 ELLY 986027613.0 472295211.0 163863830.0 Gen2 Black&White
17668 218035193.0 DELIA 409344934.0 690515072.0 218035193.0 Gen2 Black&White
17668 218035193.0 DELIA 409344934.0 690515072.0 218035193.0 Gen2 Black&White
21565 225911866.0 KANISHA 1363281.0 509012901.0 225911866.0 Gen2 Black&White
21565 225911866.0 KANISHA 1363281.0 509012901.0 225911866.0 Gen2 Black&White
21194 291846482.0 KANISHA 1363281.0 521167154.0 291846482.0 Gen2 Black&White
21194 291846482.0 KANISHA 1363281.0 521167154.0 291846482.0 Gen2 Black&White
13166 378736202.0 ALAYNA 687466224.0 200097845.0 378736202.0 Gen2 Black&White
13166 378736202.0 ALAYNA 687466224.0 200097845.0 378736202.0 Gen2 Black&White
5153 385622112.0 ELFREDA 852945005.0 823242036.0 385622112.0 Gen2 Black&White
5153 385622112.0 ELFREDA 852945005.0 823242036.0 385622112.0 Gen2 Black&White
6330 505844466.0 JONELL 461131488.0 200097845.0 505844466.0 Gen2 Black&White
6330 505844466.0 JONELL 461131488.0 200097845.0 505844466.0 Gen2 Black&White
10191 595994296.0 JOHANA 919605639.0 280225309.0 595994296.0 Gen2 Black&White
10191 595994296.0 JOHANA 919605639.0 280225309.0 595994296.0 Gen2 Black&White
15771 647803701.0 KARMA 899514543.0 120625354.0 647803701.0 Gen2 Black&White
15771 647803701.0 KARMA 899514543.0 120625354.0 647803701.0 Gen2 Black&White
13581 650752941.0 SALLIE 763377021.0 797726697.0 650752941.0 Gen2 Black&White
13581 650752941.0 SALLIE 763377021.0 797726697.0 650752941.0 Gen2 Black&White
9126 718257024.0 SHARLENE 249446444.0 30363405.0 718257024.0 Gen2 Black&White
9126 718257024.0 SHARLENE 249446444.0 30363405.0 718257024.0 Gen2 Black&White
7167 793705703.0 PENELOPE 525986311.0 650882381.0 793705703.0 Gen2 Black&White
7167 793705703.0 PENELOPE 525986311.0 650882381.0 793705703.0 Gen2 Black&White
4624 801863461.0 RANA 923804214.0 314063231.0 801863461.0 Gen2 Black&White
4624 801863461.0 RANA 923804214.0 314063231.0 801863461.0 Gen2 Black&White
18920 803590029.0 ILIANA 598175413.0 120625354.0 803590029.0 Gen2 Black&White
18920 803590029.0 ILIANA 598175413.0 120625354.0 803590029.0 Gen2 Black&White
15185 841854364.0 AI 734213689.0 879147425.0 841854364.0 Gen2 Black&White
15185 841854364.0 AI 734213689.0 879147425.0 841854364.0 Gen2 Black&White
21392 853160964.0 KANISHA 1363281.0 800981516.0 853160964.0 Gen2 Black&White
21392 853160964.0 KANISHA 1363281.0 800981516.0 853160964.0 Gen2 Black&White
22849 898011103.0 MARYAM 683759644.0 457028849.0 898011103.0 Gen2 Black&White
22849 898011103.0 MARYAM 683759644.0 457028849.0 898011103.0 Gen2 Black&White
21743 912603148.0 CATARINA 468233022.0 690515072.0 912603148.0 Gen2 Black&White
21743 912603148.0 CATARINA 468233022.0 690515072.0 912603148.0 Gen2 Black&White
7258 931135376.0 ZORAIDA 830718116.0 763066813.0 931135376.0 Gen2 Black&White
7258 931135376.0 ZORAIDA 830718116.0 763066813.0 931135376.0 Gen2 Black&White
20276 943086742.0 ETHYL 45863615.0 334058719.0 943086742.0 Gen2 Black&White
20276 943086742.0 ETHYL 45863615.0 334058719.0 943086742.0 Gen2 Black&White
18636 952650159.0 JOSIE 379535793.0 762431730.0 952650159.0 Gen2 Black&White
18636 952650159.0 JOSIE 379535793.0 762431730.0 952650159.0 Gen2 Black&White
16871 977189628.0 AKILAH 527948599.0 763066813.0 977189628.0 Gen2 Black&White
16871 977189628.0 AKILAH 527948599.0 763066813.0 977189628.0 Gen2 Black&White
In [86]:
chicken_second['C_ID'].nunique()
Out[86]:
1096
In [87]:
chicken_second.shape[0]
Out[87]:
1121
In [88]:
#Check for duplicate IDs
#chicken_second[chicken_second.duplicated(['C_ID'], keep=False)].sort_values("C_ID")

#df = df.drop_duplicates('column_name', keep='last')
c2_clean_df = chicken_second.drop_duplicates('C_ID', keep='first')
c2_clean_df.shape[0]
Out[88]:
1096
In [89]:
#Verify Eggs ID duplicate IDs are removed and
c2_clean_count = len(c2_clean_df[c2_clean_df.duplicated(['E_ID'], keep=False)].sort_values("E_ID"))
c2_clean_df[c2_clean_df.duplicated(['E_ID'], keep=False)].sort_values("E_ID")
Out[89]:
E_ID PH_Name PH_ID PR_ID C_ID Generation Color
In [90]:
#Reassign if = 0
print(c2_clean_count)
if c2_clean_count == 0:
    print("Re-assigning clean dataframe to second generation of Chickens")
    chicken_second = c2_clean_df
else:
    print("There are still some duplicate records")
0
Re-assigning clean dataframe to second generation of Chickens
In [91]:
#Assign Favorite Song to First Gen Chickens
chicken_second["Favorite_Song"] = np.random.choice(song_list, size=len(chicken_second))

#Assign Sex to First Gen Chickents
chicken_second['Sex'] = np.random.choice(sex_list, size=len(chicken_second))

second_gen_rooster_count = chicken_second[chicken_second['Sex'] == "Rooster"].shape[0]
second_gen_rooster = chicken_second[chicken_second['Sex'] == "Rooster"]

second_gen_hen_count = chicken_second[chicken_second['Sex'] == "Hen"].shape[0]
second_gen_hen = chicken_second[chicken_second['Sex'] == "Hen"]

print("The Second Geration has produced:")
print("================================")
print(second_gen_rooster_count, "Roosters")
print(second_gen_hen_count, "Hens")

#Gerate the an appropriate nummber of Rooster Names for Fist Generation and assign
second_gen_rooster_names = random.sample(rooster_names_list, second_gen_rooster_count)
second_gen_rooster['Name'] = np.random.choice(rooster_names_list, size=len(second_gen_rooster))

#Generate the appropriate number/list of Hens Names for First Generation and assign
second_gen_hens_names = random.sample(hen_names_list, second_gen_hen_count)
second_gen_hen['Name'] = np.random.choice(hen_names_list, size=len(second_gen_hen))


second_gen_chickens = pd.concat([second_gen_hen, second_gen_rooster], axis=0)
The Second Geration has produced:
================================
563 Roosters
533 Hens
In [92]:
second_gen_chickens.shape[0]
Out[92]:
1096
In [93]:
second_gen_chickens.head(5)
Out[93]:
E_ID PH_Name PH_ID PR_ID C_ID Generation Color Favorite_Song Sex Name
1103 870011705.0 SHAROLYN 59922539.0 773595312.0 870011705.0 Gen2 Black&White Wicked Sensation Hen LINNEA
7127 221747441.0 PENELOPE 525986311.0 473017162.0 221747441.0 Gen2 Black&White The Journey/It's Easy Hen SHARICE
12599 263729520.0 BARBRA 832570811.0 14586865.0 263729520.0 Gen2 Black&White Travelin' Man Hen EARLIE
6072 928521689.0 ALEXANDRA 102582398.0 967323615.0 928521689.0 Gen2 Black&White Fire Down Below Hen GITA
23289 992321108.0 JANETTE 428335380.0 119329546.0 992321108.0 Gen2 Black&White Houseparty Hen JERLENE
In [94]:
#Combine Initial Flock Chickens and First Gen Chickens into Chickens Table
chickens_total_df = pd.concat([chickens_combined_df, second_gen_chickens], axis=0 )
In [95]:
chickens_total_df.shape[0]
Out[95]:
1205
In [96]:
chickens_total_df.columns
Out[96]:
Index(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'E_ID',
       'PH_Name', 'PH_ID', 'PR_ID'],
      dtype='object')
In [97]:
chickens_total_df['C_ID'].nunique()
Out[97]:
1203
In [98]:
#c2_clean_df = chicken_second.drop_duplicates('C_ID', keep='first')

chickens_total_df = chickens_total_df.drop_duplicates('C_ID', keep='first')
In [99]:
#Verify Duplicates are removed
chickens_total_df[chickens_total_df.duplicated(['C_ID'], keep=False)]
Out[99]:
Name Sex Color C_ID Generation Favorite_Song E_ID PH_Name PH_ID PR_ID
In [100]:
chickens_total_df.to_csv('Coop/chickens_total_df.csv')
In [101]:
#### Combine Both Generation Eggs into Complete Table
eggs_total_df = pd.concat([egg_df, eggs2_df], axis=0)
In [102]:
eggs_total_df.shape[0]
Out[102]:
25740
In [103]:
eggs_total_df.to_csv('Coop/eggs_total_df.csv')

Exercise 3 Name Tags Table Generation¶

In [104]:
chickens_total_df.columns
Out[104]:
Index(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'E_ID',
       'PH_Name', 'PH_ID', 'PR_ID'],
      dtype='object')
In [105]:
chickens_total_df.shape[0]
Out[105]:
1203
In [106]:
ctags_df = chickens_total_df[['Name', 'Favorite_Song', 'C_ID', 'E_ID', 'PH_Name', 'PH_ID', 'PR_ID']]
In [107]:
#Rename PH_Name coloumn to Mother - To provide cleanar data
ctags_df.rename({'PH_Name': 'Mother'}, axis=1, inplace=True)
In [108]:
etags_df = eggs_total_df
In [109]:
etags_df.columns
Out[109]:
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'IH_LOC', 'NW', 'Parent_Gen'], dtype='object')
In [110]:
itag_df = etags_df[['E_ID', 'IH_LOC']]
In [111]:
#Merge Chickens with Eggs to map Chickens to Eggs.
tags_df = pd.merge(ctags_df, etags_df, how='outer', on = 'E_ID' , suffixes=['_Chicken','_Egg'])
In [112]:
tags_df.head(7)
Out[112]:
Name Favorite_Song C_ID E_ID Mother PH_ID_Chicken PR_ID_Chicken PH_Name PH_ID_Egg PR_ID_Egg IH_LOC NW Parent_Gen
0 Mary Paradise By The Dashboard Light 225766667 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Pat With a Little Luck 631311759 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Barb Walk On The Wild Side 397591248 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 Liz Wild Horses 630559425 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 Ralph Pride of Man 800981516 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 GWENN Have a Drink On Me 884986770 884986770.0 Pat 631311759 800981516 Pat 631311759 800981516 B True Initial_Flock
6 LOLITA And She Was 744877899 744877899.0 Pat 631311759 800981516 Pat 631311759 800981516 C False Initial_Flock
In [113]:
#tags_df.shape[0]

Add Father Name - Join Rooster Names¶

In [114]:
rooster_total_df = chickens_total_df[chickens_total_df['Sex'] == 'Rooster']
In [115]:
rooster_total_df.shape[0]
Out[115]:
617
In [116]:
#rooster_total_df.columns
In [117]:
rooster_tags_df = rooster_total_df[['Name', 'C_ID', 'Generation', 'E_ID', 'PH_Name', 'PH_ID', 'PR_ID']]
In [118]:
#twr_df: Tags with Rooster Data
twr_df = pd.merge(tags_df, rooster_tags_df, how='inner', left_on='PR_ID_Chicken', right_on='C_ID', suffixes=['_t', '_Father'])
In [119]:
twr_df.shape[0]
Out[119]:
1171
In [120]:
#twr_df.head(5)
twr_df.columns
Out[120]:
Index(['Name_t', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother',
       'PH_ID_Chicken', 'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg',
       'IH_LOC', 'NW', 'Parent_Gen', 'Name_Father', 'C_ID_Father',
       'Generation', 'E_ID_Father', 'PH_Name_Father', 'PH_ID', 'PR_ID'],
      dtype='object')
In [121]:
twr_df[['Name_t', 'Favorite_Song', 'Mother', 'Name_Father']].head(5)
Out[121]:
Name_t Favorite_Song Mother Name_Father
0 GWENN Have a Drink On Me Pat Ralph
1 LOLITA And She Was Pat Ralph
2 SHAROLYN Night Prowler Mary Ralph
3 RUFINA Hypnotized Barb Ralph
4 FERMINA Limelight Pat Ralph
In [122]:
#Export Tags with Roosers to CSV file for data analysis
twr_df.to_csv('Coop/twr_df.csv')
In [123]:
#twr_df.columns
In [124]:
#ctags_df.columns
In [125]:
#Chicken Grand Parents Tags Datafrmae to use for merging so we have less data in the dataframe to merge
cgptags_df = ctags_df[['Name', 'Mother', 'C_ID', 'E_ID']]
#cgptags_df = ctags_df[['Name', 'C_ID', 'E_ID']]

Merge in GrandParents using IDs¶

In [126]:
#tags_with_rooster (Father) and GrandMother - twrgm_df
twrgm_df = pd.merge(twr_df, cgptags_df, how='outer', left_on='PH_ID_Chicken', right_on='C_ID', suffixes=['twr', '_Maternal_Mother'])
#tags_with_rooster_grandmother_and_grandfather
twrgf_df = pd.merge(twrgm_df, cgptags_df, how='outer', left_on='PR_ID', right_on='C_ID', suffixes=['CMM', '_Maternal_Father'])
In [127]:
twrgf_df.columns
Out[127]:
Index(['Name_t', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mothertwr',
       'PH_ID_Chicken', 'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg',
       'IH_LOC', 'NW', 'Parent_Gen', 'Name_Father', 'C_ID_Father',
       'Generation', 'E_ID_Father', 'PH_Name_Father', 'PH_ID', 'PR_ID',
       'NameCMM', 'Mother_Maternal_Mother', 'C_IDCMM', 'E_IDCMM',
       'Name_Maternal_Father', 'Mother', 'C_ID_Maternal_Father',
       'E_ID_Maternal_Father'],
      dtype='object')
In [128]:
#Rename PH_Name coloumn to Mother - To provide cleanar data
twrgf_df.rename({'Mother_Maternal_Mother': 'GrandMother'}, axis=1, inplace=True)
twrgf_df.rename({'Name_Maternal_Father': 'GrandFather'}, axis=1, inplace=True)
twrgf_df.rename({'Name_Father': 'Father'}, axis=1, inplace=True)

Merge in Incubation Location¶

In [129]:
#twrgf_df['IH_LOC'].head(3)
In [130]:
twrgf_df[['GrandMother', 'GrandFather', 'Father']].head(5)
Out[130]:
GrandMother GrandFather Father
0 NaN NaN Ralph
1 NaN NaN Ralph
2 NaN NaN Ralph
3 NaN NaN Ralph
4 NaN NaN Ralph
In [131]:
#itag_df.columns
In [132]:
EIDs = [ 'E_ID_', 'PH_ID_Egg', 'PR_ID_Egg',  'E_ID_Father', 'E_ID_Maternal_Mother', 'E_ID']
In [133]:
#twrgd2_loc_df.columns
In [134]:
#Merge in Incubation Location for Mother, Father, GrandMother and GrandFather using Egg Table - E_IDs
twrgd_loc_df = pd.merge(twrgf_df, itag_df, left_on=['PH_ID_Chicken'], right_on = ['E_ID'], how='inner', suffixes=['Chicken', '_MotherIL'])
twrgd1_loc_df = pd.merge(twrgd_loc_df, itag_df, left_on=['PR_ID_Egg'], right_on = ['E_ID'], how='inner', suffixes=['CM', '_FatherIL'])
twrgd2_loc_df = pd.merge(twrgd1_loc_df, itag_df, left_on=['E_IDCMM'], right_on = ['E_ID'], how='inner', suffixes=['CMF', '_GrandMother_IL'])
twrgd3_loc_df = pd.merge(twrgd2_loc_df, itag_df, left_on=['E_ID'], right_on = ['E_ID'], how='inner', suffixes=['C', '_GrandFather_IL'])
In [135]:
#Analyze existing Column Names
twrgd3_loc_df.columns
Out[135]:
Index(['Name_t', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mothertwr',
       'PH_ID_Chicken', 'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg',
       'IH_LOCChicken', 'NW', 'Parent_Gen', 'Father', 'C_ID_Father',
       'Generation', 'E_ID_Father', 'PH_Name_Father', 'PH_ID', 'PR_ID',
       'NameCMM', 'GrandMother', 'C_IDCMM', 'E_IDCMM', 'GrandFather', 'Mother',
       'C_ID_Maternal_Father', 'E_ID_Maternal_Father', 'E_IDCM',
       'IH_LOC_MotherIL', 'E_ID_FatherIL', 'IH_LOCCMF', 'E_ID',
       'IH_LOC_GrandMother_IL', 'IH_LOC'],
      dtype='object')
In [136]:
int_tags_df = twrgd3_loc_df.drop('Mother', axis=1)
int_tags_df.head(3)
Out[136]:
Name_t Favorite_Song C_ID_t E_ID_t Mothertwr PH_ID_Chicken PR_ID_Chicken PH_Name_t PH_ID_Egg PR_ID_Egg ... GrandFather C_ID_Maternal_Father E_ID_Maternal_Father E_IDCM IH_LOC_MotherIL E_ID_FatherIL IH_LOCCMF E_ID IH_LOC_GrandMother_IL IH_LOC
0 SHEMIKA We Will Rock You 311742868.0 311742868.0 SONG 38690046.0 773595312.0 SONG 38690046.0 773595312.0 ... Ralph 800981516 NaN 38690046 B 773595312 C 38690046 B B
1 LOMA Mr. Tinkertrain 521708972.0 521708972.0 SONG 38690046.0 967323615.0 SONG 38690046.0 967323615.0 ... Ralph 800981516 NaN 38690046 B 967323615 B 38690046 B B
2 Dominik Lodi 466892529.0 466892529.0 SONG 38690046.0 457028849.0 SONG 38690046.0 457028849.0 ... Ralph 800981516 NaN 38690046 B 457028849 C 38690046 B B

3 rows × 34 columns

In [137]:
int_tags_df[['IH_LOCChicken',
       'Parent_Gen','Mothertwr', 'Father', 
        'GrandMother', 
       'GrandFather',  
        'IH_LOC_MotherIL', 'IH_LOCCMF',
        'IH_LOC_GrandMother_IL', 'IH_LOC']].head(5)
Out[137]:
IH_LOCChicken Parent_Gen Mothertwr Father GrandMother GrandFather IH_LOC_MotherIL IH_LOCCMF IH_LOC_GrandMother_IL IH_LOC
0 C Gen1 SONG Easton Mary Ralph B C B B
1 A Gen1 SONG Muhammad Mary Ralph B B B B
2 A Gen1 SONG Deonte Mary Ralph B C B B
3 B Gen1 SONG Kamron Mary Ralph B B B B
4 C Gen1 SONG Kamron Mary Ralph B B B B
In [138]:
#Rename PH_Name coloumn to Mother - To provide cleanar data
twrgd3_loc_df.rename({'Mothertwr': 'Mother'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'Nametwr': 'Name'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'IH_LOCChicken': 'I_LOC'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'IH_LOC_MotherIL': 'Mother_I_LOC'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'IH_LOCCMF': 'Father_I_LOC'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'IH_LOC_GrandMother_IL': 'GrandMother_I_LOC'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'IH_LOC': 'GrandFather_I_LOC'}, axis=1, inplace=True)
In [139]:
#Rename PH_Name coloumn to Mother - To provide cleanar data
int_tags_df.rename({'Mothertwr': 'Mother'}, axis=1, inplace=True)
int_tags_df.rename({'Name_t': 'Name'}, axis=1, inplace=True)
int_tags_df.rename({'IH_LOCChicken': 'I_LOC'}, axis=1, inplace=True)
int_tags_df.rename({'IH_LOC_MotherIL': 'Mother_I_LOC'}, axis=1, inplace=True)
int_tags_df.rename({'IH_LOCCMF': 'Father_I_LOC'}, axis=1, inplace=True)
int_tags_df.rename({'IH_LOC_GrandMother_IL': 'GrandMother_I_LOC'}, axis=1, inplace=True)
int_tags_df.rename({'IH_LOC': 'GrandFather_I_LOC'}, axis=1, inplace=True)
In [140]:
int_tags_df.head(5)
Out[140]:
Name Favorite_Song C_ID_t E_ID_t Mother PH_ID_Chicken PR_ID_Chicken PH_Name_t PH_ID_Egg PR_ID_Egg ... GrandFather C_ID_Maternal_Father E_ID_Maternal_Father E_IDCM Mother_I_LOC E_ID_FatherIL Father_I_LOC E_ID GrandMother_I_LOC GrandFather_I_LOC
0 SHEMIKA We Will Rock You 311742868.0 311742868.0 SONG 38690046.0 773595312.0 SONG 38690046.0 773595312.0 ... Ralph 800981516 NaN 38690046 B 773595312 C 38690046 B B
1 LOMA Mr. Tinkertrain 521708972.0 521708972.0 SONG 38690046.0 967323615.0 SONG 38690046.0 967323615.0 ... Ralph 800981516 NaN 38690046 B 967323615 B 38690046 B B
2 Dominik Lodi 466892529.0 466892529.0 SONG 38690046.0 457028849.0 SONG 38690046.0 457028849.0 ... Ralph 800981516 NaN 38690046 B 457028849 C 38690046 B B
3 Brad Helter Skelter 304992598.0 304992598.0 SONG 38690046.0 762431730.0 SONG 38690046.0 762431730.0 ... Ralph 800981516 NaN 38690046 B 762431730 B 38690046 B B
4 Willie South California Purples 313244498.0 313244498.0 SONG 38690046.0 762431730.0 SONG 38690046.0 762431730.0 ... Ralph 800981516 NaN 38690046 B 762431730 B 38690046 B B

5 rows × 34 columns

In [141]:
int_tags_df.to_csv('Coop/tags.csv')
In [142]:
int_tags_df.columns
Out[142]:
Index(['Name', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother', 'PH_ID_Chicken',
       'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg', 'I_LOC', 'NW',
       'Parent_Gen', 'Father', 'C_ID_Father', 'Generation', 'E_ID_Father',
       'PH_Name_Father', 'PH_ID', 'PR_ID', 'NameCMM', 'GrandMother', 'C_IDCMM',
       'E_IDCMM', 'GrandFather', 'C_ID_Maternal_Father',
       'E_ID_Maternal_Father', 'E_IDCM', 'Mother_I_LOC', 'E_ID_FatherIL',
       'Father_I_LOC', 'E_ID', 'GrandMother_I_LOC', 'GrandFather_I_LOC'],
      dtype='object')
In [143]:
int_tags_df[['Name', 'Favorite_Song', 'Mother', 'Father', 'GrandMother', 'GrandFather', 'I_LOC', 'Mother_I_LOC', 'Father_I_LOC', 'GrandMother_I_LOC', 'GrandFather_I_LOC' ]].head(6)
Out[143]:
Name Favorite_Song Mother Father GrandMother GrandFather I_LOC Mother_I_LOC Father_I_LOC GrandMother_I_LOC GrandFather_I_LOC
0 SHEMIKA We Will Rock You SONG Easton Mary Ralph C B C B B
1 LOMA Mr. Tinkertrain SONG Muhammad Mary Ralph A B B B B
2 Dominik Lodi SONG Deonte Mary Ralph A B C B B
3 Brad Helter Skelter SONG Kamron Mary Ralph B B B B B
4 Willie South California Purples SONG Kamron Mary Ralph C B B B B
5 ESSIE THE CONTINUING STORY OF BUNG SONG Lincoln Mary Ralph A B A B B
In [144]:
int_tags_df.shape[0]
Out[144]:
1052
In [145]:
grand = int_tags_df[twrgf_df['Parent_Gen'] == 'Gen1']
In [146]:
grand.columns
Out[146]:
Index(['Name', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother', 'PH_ID_Chicken',
       'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg', 'I_LOC', 'NW',
       'Parent_Gen', 'Father', 'C_ID_Father', 'Generation', 'E_ID_Father',
       'PH_Name_Father', 'PH_ID', 'PR_ID', 'NameCMM', 'GrandMother', 'C_IDCMM',
       'E_IDCMM', 'GrandFather', 'C_ID_Maternal_Father',
       'E_ID_Maternal_Father', 'E_IDCM', 'Mother_I_LOC', 'E_ID_FatherIL',
       'Father_I_LOC', 'E_ID', 'GrandMother_I_LOC', 'GrandFather_I_LOC'],
      dtype='object')
In [147]:
grand[['Name',  'Mother', 
       'Parent_Gen', 'Father',
       'GrandMother', 'GrandFather']].head(6)
Out[147]:
Name Mother Parent_Gen Father GrandMother GrandFather
102 Jamie MARLEEN Gen1 Lincoln Mary Ralph
103 Kory MARLEEN Gen1 Lincoln Mary Ralph
104 KEESHA MARLEEN Gen1 Samir Mary Ralph
105 Keshawn MARLEEN Gen1 Tommy Mary Ralph
106 Ruben MARLEEN Gen1 Eli Mary Ralph
107 Asa MARLEEN Gen1 Anderson Mary Ralph
In [148]:
int_tags_df['GrandMother'].nunique()
Out[148]:
4

In [ ]:
 

Create Cousins from GrandMother - GroupBy GrandMother and Assign back to the Table for Tags¶

In [149]:
cousins_df = int_tags_df.groupby('GrandMother')
In [150]:
cousins_df.first()
Out[150]:
Name Favorite_Song C_ID_t E_ID_t Mother PH_ID_Chicken PR_ID_Chicken PH_Name_t PH_ID_Egg PR_ID_Egg ... GrandFather C_ID_Maternal_Father E_ID_Maternal_Father E_IDCM Mother_I_LOC E_ID_FatherIL Father_I_LOC E_ID GrandMother_I_LOC GrandFather_I_LOC
GrandMother
Barb IDELL If You Could Only See 423886016.0 423886016.0 YANG 729890819.0 773595312.0 YANG 729890819.0 773595312.0 ... Ralph 800981516 None 729890819 B 773595312 C 729890819 B B
Liz Lukas AS FAR AS YOU CAN SEE (AS MU 996920976.0 996920976.0 ILIANA 598175413.0 773595312.0 ILIANA 598175413.0 773595312.0 ... Ralph 800981516 None 598175413 C 773595312 C 598175413 C C
Mary SHEMIKA We Will Rock You 311742868.0 311742868.0 SONG 38690046.0 773595312.0 SONG 38690046.0 773595312.0 ... Ralph 800981516 None 38690046 B 773595312 C 38690046 B B
Pat Jakob Hocus Pocus 258968523.0 258968523.0 GWENN 884986770.0 773595312.0 GWENN 884986770.0 773595312.0 ... Ralph 800981516 None 884986770 B 773595312 C 884986770 B B

4 rows × 33 columns

In [151]:
barb_family_list = cousins_df.get_group('Barb')['Name'].tolist()
liz_family_list = cousins_df.get_group('Liz')['Name'].tolist()
mary_family_list = cousins_df.get_group('Mary')['Name'].tolist()
pat_family_list = cousins_df.get_group('Pat')['Name'].tolist()
In [152]:
size=len(barb_family_list)
size
Out[152]:
166
In [153]:
#type(#Assign Favorite Song to First Gen Chickens
#chicken_first["Favorite_Song"] = np.random.choice(song_list, size=len(chicken_first)))
 #   df['b'] = np.where(df.a.values == 0, np.nan, df.b.values)
barb_df = int_tags_df[int_tags_df['GrandMother'] == 'Barb']
liz_df = int_tags_df[int_tags_df['GrandMother'] == 'Liz']
mary_df = int_tags_df[int_tags_df['GrandMother'] == 'Mary']
pat_df = int_tags_df[int_tags_df['GrandMother'] == 'Pat']

barb_df['Cousin'] = np.random.choice(barb_family_list, size=len(barb_family_list))
liz_df['Cousin'] = np.random.choice(liz_family_list, size=len(liz_family_list))
mary_df['Cousin'] = np.random.choice(mary_family_list, size=len(mary_family_list))
pat_df['Cousin'] = np.random.choice(pat_family_list, size=len(pat_family_list))

#int_tags_df['Cousin'] = np.where(int_tags_df['GrandMother'] == 'Barb'), np.random.choice(barb_family_list, size=len(barb_family_list))

barb_c_df = barb_df[['Name', 'C_ID_t', 'Cousin']]
liz_c_df = liz_df[['Name', 'C_ID_t', 'Cousin']]
mary_c_df = mary_df[['Name', 'C_ID_t', 'Cousin']]
pat_c_df = pat_df[['Name', 'C_ID_t', 'Cousin']]

#Stack the individual family cousins into a single datafrme for merging into Chicken Tags with GrandParents and Location
cousins_selected_df = pd.concat([barb_c_df, liz_c_df, mary_c_df, pat_c_df], axis=0)
In [154]:
#Uncomment and run to verify Chicken Name and Cousin Name
#cousins_selected_df.head(10)
In [155]:
int_tags_df.columns
Out[155]:
Index(['Name', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother', 'PH_ID_Chicken',
       'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg', 'I_LOC', 'NW',
       'Parent_Gen', 'Father', 'C_ID_Father', 'Generation', 'E_ID_Father',
       'PH_Name_Father', 'PH_ID', 'PR_ID', 'NameCMM', 'GrandMother', 'C_IDCMM',
       'E_IDCMM', 'GrandFather', 'C_ID_Maternal_Father',
       'E_ID_Maternal_Father', 'E_IDCM', 'Mother_I_LOC', 'E_ID_FatherIL',
       'Father_I_LOC', 'E_ID', 'GrandMother_I_LOC', 'GrandFather_I_LOC'],
      dtype='object')
In [156]:
int_tags_df[['Name', 'C_ID_t']].head(5)
Out[156]:
Name C_ID_t
0 SHEMIKA 311742868.0
1 LOMA 521708972.0
2 Dominik 466892529.0
3 Brad 304992598.0
4 Willie 313244498.0
In [157]:
name_tags_full_df = pd.merge(int_tags_df, cousins_selected_df, how='outer', on='C_ID_t') 
In [158]:
name_tags_full_df.columns
Out[158]:
Index(['Name_x', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother',
       'PH_ID_Chicken', 'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg',
       'I_LOC', 'NW', 'Parent_Gen', 'Father', 'C_ID_Father', 'Generation',
       'E_ID_Father', 'PH_Name_Father', 'PH_ID', 'PR_ID', 'NameCMM',
       'GrandMother', 'C_IDCMM', 'E_IDCMM', 'GrandFather',
       'C_ID_Maternal_Father', 'E_ID_Maternal_Father', 'E_IDCM',
       'Mother_I_LOC', 'E_ID_FatherIL', 'Father_I_LOC', 'E_ID',
       'GrandMother_I_LOC', 'GrandFather_I_LOC', 'Name_y', 'Cousin'],
      dtype='object')
In [159]:
name_tags_full_df.rename({'Name_x': 'Name'}, axis=1, inplace=True)
name_tags_full_df[['Name', 'Cousin']].head(5)
Out[159]:
Name Cousin
0 SHEMIKA Keshawn
1 LOMA Marc
2 Dominik MAGDALEN
3 Brad Conner
4 Willie Emanuel
In [160]:
name_tags_full_df[['Name', 'Favorite_Song', 'Mother', 'Father', 'GrandMother', 'GrandFather', 'I_LOC', 'Mother_I_LOC', 'Father_I_LOC', 'GrandMother_I_LOC', 'GrandFather_I_LOC', 'Cousin' ]].head(6)
Out[160]:
Name Favorite_Song Mother Father GrandMother GrandFather I_LOC Mother_I_LOC Father_I_LOC GrandMother_I_LOC GrandFather_I_LOC Cousin
0 SHEMIKA We Will Rock You SONG Easton Mary Ralph C B C B B Keshawn
1 LOMA Mr. Tinkertrain SONG Muhammad Mary Ralph A B B B B Marc
2 Dominik Lodi SONG Deonte Mary Ralph A B C B B MAGDALEN
3 Brad Helter Skelter SONG Kamron Mary Ralph B B B B B Conner
4 Willie South California Purples SONG Kamron Mary Ralph C B B B B Emanuel
5 ESSIE THE CONTINUING STORY OF BUNG SONG Lincoln Mary Ralph A B A B B Clayton
In [161]:
name_tags_full_df.shape[0]
Out[161]:
1052
In [162]:
name_tags_full_df.columns
Out[162]:
Index(['Name', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother', 'PH_ID_Chicken',
       'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg', 'I_LOC', 'NW',
       'Parent_Gen', 'Father', 'C_ID_Father', 'Generation', 'E_ID_Father',
       'PH_Name_Father', 'PH_ID', 'PR_ID', 'NameCMM', 'GrandMother', 'C_IDCMM',
       'E_IDCMM', 'GrandFather', 'C_ID_Maternal_Father',
       'E_ID_Maternal_Father', 'E_IDCM', 'Mother_I_LOC', 'E_ID_FatherIL',
       'Father_I_LOC', 'E_ID', 'GrandMother_I_LOC', 'GrandFather_I_LOC',
       'Name_y', 'Cousin'],
      dtype='object')
In [163]:
duplicateIDs_nt_df = name_tags_full_df[name_tags_full_df.duplicated(['C_ID_t'], keep=False)]
In [164]:
duplicateIDs_nt_df.to_csv('Coop/duplicated_nt_ids.csv')
In [165]:
### Name Tag clean up to remove any possible duplicates and trim to only 1000 records as that is the number
### defined in the exercise: 1000
#df.drop_duplicates(subset='A', keep="last")
name_tags_full_df.drop_duplicates(subset='Name', keep='first')
Out[165]:
Name Favorite_Song C_ID_t E_ID_t Mother PH_ID_Chicken PR_ID_Chicken PH_Name_t PH_ID_Egg PR_ID_Egg ... E_ID_Maternal_Father E_IDCM Mother_I_LOC E_ID_FatherIL Father_I_LOC E_ID GrandMother_I_LOC GrandFather_I_LOC Name_y Cousin
0 SHEMIKA We Will Rock You 311742868.0 311742868.0 SONG 38690046.0 773595312.0 SONG 38690046.0 773595312.0 ... NaN 38690046 B 773595312 C 38690046 B B SHEMIKA Keshawn
1 LOMA Mr. Tinkertrain 521708972.0 521708972.0 SONG 38690046.0 967323615.0 SONG 38690046.0 967323615.0 ... NaN 38690046 B 967323615 B 38690046 B B LOMA Marc
2 Dominik Lodi 466892529.0 466892529.0 SONG 38690046.0 457028849.0 SONG 38690046.0 457028849.0 ... NaN 38690046 B 457028849 C 38690046 B B Dominik MAGDALEN
3 Brad Helter Skelter 304992598.0 304992598.0 SONG 38690046.0 762431730.0 SONG 38690046.0 762431730.0 ... NaN 38690046 B 762431730 B 38690046 B B Brad Conner
4 Willie South California Purples 313244498.0 313244498.0 SONG 38690046.0 762431730.0 SONG 38690046.0 762431730.0 ... NaN 38690046 B 762431730 B 38690046 B B Willie Emanuel
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1044 SHANTI Jungleland 861338094.0 861338094.0 AKILAH 135192679.0 763066813.0 AKILAH 135192679.0 763066813.0 ... NaN 135192679 C 763066813 B 135192679 C C SHANTI Kory
1046 TANESHA Elected 504981698.0 504981698.0 AKILAH 135192679.0 823242036.0 AKILAH 135192679.0 823242036.0 ... NaN 135192679 C 823242036 C 135192679 C C TANESHA Anderson
1048 SANORA Breakdown 372444260.0 372444260.0 AKILAH 135192679.0 879147425.0 AKILAH 135192679.0 879147425.0 ... NaN 135192679 C 879147425 A 135192679 C C SANORA Kieran
1050 ADELE The Rain Song 551433486.0 551433486.0 AKILAH 135192679.0 650882381.0 AKILAH 135192679.0 650882381.0 ... NaN 135192679 C 650882381 A 135192679 C C ADELE Marc
1051 DORTHY Down On the Corner 631704795.0 631704795.0 AKILAH 135192679.0 409300483.0 AKILAH 135192679.0 409300483.0 ... NaN 135192679 C 409300483 B 135192679 C C DORTHY Arturo

914 rows × 36 columns

In [166]:
name_tags_full_df.to_csv('Coop/name_tags_complete_df.csv')
In [167]:
name_tags_df = name_tags_full_df[['Name', 'Favorite_Song', 'Mother', 'Father', 'GrandMother', 'GrandFather', 'I_LOC', 'Mother_I_LOC', 'Father_I_LOC', 'GrandMother_I_LOC', 'GrandFather_I_LOC', 'Cousin' ]]
In [168]:
name_tags_df
Out[168]:
Name Favorite_Song Mother Father GrandMother GrandFather I_LOC Mother_I_LOC Father_I_LOC GrandMother_I_LOC GrandFather_I_LOC Cousin
0 SHEMIKA We Will Rock You SONG Easton Mary Ralph C B C B B Keshawn
1 LOMA Mr. Tinkertrain SONG Muhammad Mary Ralph A B B B B Marc
2 Dominik Lodi SONG Deonte Mary Ralph A B C B B MAGDALEN
3 Brad Helter Skelter SONG Kamron Mary Ralph B B B B B Conner
4 Willie South California Purples SONG Kamron Mary Ralph C B B B B Emanuel
... ... ... ... ... ... ... ... ... ... ... ... ...
1047 Terrence What Do You Do For Money Honey AKILAH Warren Mary Ralph A C C C C Irvin
1048 SANORA Breakdown AKILAH Landon Mary Ralph B C A C C Kieran
1049 Sincere Peg AKILAH Matthew Mary Ralph C C B C C Dillon
1050 ADELE The Rain Song AKILAH Karson Mary Ralph B C A C C Marc
1051 DORTHY Down On the Corner AKILAH Kellen Mary Ralph C C B C C Arturo

1052 rows × 12 columns

In [169]:
name_tags_full_df.shape[0]
Out[169]:
1052
In [170]:
#Per the exercise instruction we will just generate the firt 1000 Tags for the Table.
name_tags_1000_df = name_tags_full_df.head(1000)
In [171]:
#Verify Name Tag Table Shape
name_tags_1000_df.shape
Out[171]:
(1000, 36)
In [172]:
name_tags_df.to_csv('Coop/name_tags_final_df.csv')
name_tags_1000_df.to_csv('Coop/name_tags_1000_final_df.csv')
In [173]:
chickens_total_df['Generation'].nunique()
Out[173]:
3
In [174]:
chickens_total_df.columns
Out[174]:
Index(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'E_ID',
       'PH_Name', 'PH_ID', 'PR_ID'],
      dtype='object')
In [175]:
print("Checking for any duplicate Chickens in the Table = ", len(chickens_total_df[chickens_total_df.duplicated(['C_ID'], keep=False)]))
chickens_total_df[chickens_total_df.duplicated(['C_ID'], keep=False)]
Checking for any duplicate Chickens in the Table =  0
Out[175]:
Name Sex Color C_ID Generation Favorite_Song E_ID PH_Name PH_ID PR_ID

Bonus Assignment - Show Hatch Dates and make sure the data makes sense¶

In [176]:
## Bonus Assign Egg Date and Hatch Dates to Generation - use to plot
#dailylog_df
#Assume Gen 1 has about 150 days for intial flock starts producing eggs and the eggs hatch 
#eggloggen1_df
gen_one_dates_list = dailylog_df[150:]['Dates'].tolist()
gen_two_dates_list = eggloggen1_df[150:]['Dates'].tolist()
In [177]:
#eggloggen1_df[150:]['Dates']
In [178]:
gen_one_lenght = chickens_total_df[chickens_total_df.Generation == 'Gen1'].shape[0]
gen_two_lenght = chickens_total_df[chickens_total_df.Generation == 'Gen2'].shape[0]
In [179]:
#Dataframes:
#chickens_total_df.head(5)
gen_one_length = chickens_total_df.Generation == 'Gen1'
#Assign Nan values to Chickens with Generation of Initial_Flock for hatch_date
chickens_total_df.loc[chickens_total_df.Generation == 'Initial_Flock', 'Hatch_Date'] = np.nan
chickens_total_df.loc[chickens_total_df.Generation == 'Gen1', 'Hatch_Date'] = np.random.choice(gen_one_dates_list, size=gen_one_lenght)
chickens_total_df.loc[chickens_total_df.Generation == 'Gen2', 'Hatch_Date'] = np.random.choice(gen_two_dates_list, size=gen_two_lenght)

#eggs_total_df
chickens_total_df.head(7)
Out[179]:
Name Sex Color C_ID Generation Favorite_Song E_ID PH_Name PH_ID PR_ID Hatch_Date
0 Mary Hen Black&White 225766667 Initial_Flock Paradise By The Dashboard Light NaN NaN NaN NaN NaN
1 Pat Hen Black&White 631311759 Initial_Flock With a Little Luck NaN NaN NaN NaN NaN
2 Barb Hen Black&White 397591248 Initial_Flock Walk On The Wild Side NaN NaN NaN NaN NaN
3 Liz Hen Black&White 630559425 Initial_Flock Wild Horses NaN NaN NaN NaN NaN
4 Ralph Rooster Black&White 800981516 Initial_Flock Pride of Man NaN NaN NaN NaN NaN
1134 GWENN Hen Black&White 884986770 Gen1 Have a Drink On Me 884986770 Pat 631311759 800981516 2020-12-26 00:00:00
850 LOLITA Hen Black&White 744877899 Gen1 And She Was 744877899 Pat 631311759 800981516 2022-03-26 00:00:00
In [180]:
chickens_total_df.columns
Out[180]:
Index(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'E_ID',
       'PH_Name', 'PH_ID', 'PR_ID', 'Hatch_Date'],
      dtype='object')
In [181]:
liz_df
Out[181]:
Name Favorite_Song C_ID_t E_ID_t Mother PH_ID_Chicken PR_ID_Chicken PH_Name_t PH_ID_Egg PR_ID_Egg ... C_ID_Maternal_Father E_ID_Maternal_Father E_IDCM Mother_I_LOC E_ID_FatherIL Father_I_LOC E_ID GrandMother_I_LOC GrandFather_I_LOC Cousin
73 Lukas AS FAR AS YOU CAN SEE (AS MU 996920976.0 996920976.0 ILIANA 598175413.0 773595312.0 ILIANA 598175413.0 773595312.0 ... 800981516 NaN 598175413 C 773595312 C 598175413 C C Zander
74 Luca Women 988498811.0 988498811.0 ILIANA 598175413.0 967323615.0 ILIANA 598175413.0 967323615.0 ... 800981516 NaN 598175413 C 967323615 B 598175413 C C Domenic
75 MANDIE Mistral Wind 282876868.0 282876868.0 ILIANA 598175413.0 457028849.0 ILIANA 598175413.0 457028849.0 ... 800981516 NaN 598175413 C 457028849 C 598175413 C C Todd
76 Scott Paint It Black 964350220.0 964350220.0 ILIANA 598175413.0 457028849.0 ILIANA 598175413.0 457028849.0 ... 800981516 NaN 598175413 C 457028849 C 598175413 C C Nikhil
77 LUANNA Sirens 545922093.0 545922093.0 ILIANA 598175413.0 762431730.0 ILIANA 598175413.0 762431730.0 ... 800981516 NaN 598175413 C 762431730 B 598175413 C C JANEAN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
932 Pedro Sunday Bloody Sunday 871067952.0 871067952.0 JANETTE 428335380.0 823242036.0 JANETTE 428335380.0 823242036.0 ... 800981516 NaN 428335380 B 823242036 C 428335380 B B Koby
933 BRIGETTE Hi Hi Hi 213573332.0 213573332.0 JANETTE 428335380.0 70708035.0 JANETTE 428335380.0 70708035.0 ... 800981516 NaN 428335380 B 70708035 C 428335380 B B KARENA
934 MIRIAN Rocky Mountain Way 915967235.0 915967235.0 JANETTE 428335380.0 857956630.0 JANETTE 428335380.0 857956630.0 ... 800981516 NaN 428335380 B 857956630 A 428335380 B B MICHELE
935 JERLENE Houseparty 992321108.0 992321108.0 JANETTE 428335380.0 119329546.0 JANETTE 428335380.0 119329546.0 ... 800981516 NaN 428335380 B 119329546 B 428335380 B B Aidan
936 Larry Born To Run 833217012.0 833217012.0 JANETTE 428335380.0 472295211.0 JANETTE 428335380.0 472295211.0 ... 800981516 NaN 428335380 B 472295211 C 428335380 B B BASILIA

320 rows × 35 columns

In [182]:
eggs_total_df.shape[0]
Out[182]:
25740
In [183]:
#Remove Eggs that Hatched into Chickens using a left outer join
egg_only_df = pd.merge(eggs_total_df, chickens_total_df, how='outer', on='E_ID', indicator=True).query('_merge=="left_only"')
In [184]:
egg_only_df.shape[0]
Out[184]:
24542
In [185]:
#egg_only_df.columns
In [186]:
# Add Egg Dates to Eggs.
#df.drop(['column_nameA', 'column_nameB'], axis=1, inplace=True)
egg_only_df.drop(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'PH_Name_y', 'PH_ID_y', 'PR_ID_y', 'Hatch_Date', '_merge'], axis=1, inplace=True)

egg_only_df.head(5)
Out[186]:
E_ID PH_Name_x PH_ID_x PR_ID_x IH_LOC NW Parent_Gen
0 592323119.0 Mary 225766667 800981516 C False Initial_Flock
1 410431189.0 Mary 225766667 800981516 C False Initial_Flock
2 729054692.0 Mary 225766667 800981516 B True Initial_Flock
3 630362478.0 Mary 225766667 800981516 A False Initial_Flock
4 796885249.0 Mary 225766667 800981516 C False Initial_Flock
In [187]:
egg_lenght = egg_only_df.shape[0]
egg_only_df['Egg_Date'] = np.random.choice(gen_one_dates_list, size=egg_lenght)
egg_only_df.head(5)
Out[187]:
E_ID PH_Name_x PH_ID_x PR_ID_x IH_LOC NW Parent_Gen Egg_Date
0 592323119.0 Mary 225766667 800981516 C False Initial_Flock 2022-05-21
1 410431189.0 Mary 225766667 800981516 C False Initial_Flock 2020-09-22
2 729054692.0 Mary 225766667 800981516 B True Initial_Flock 2022-06-15
3 630362478.0 Mary 225766667 800981516 A False Initial_Flock 2021-11-29
4 796885249.0 Mary 225766667 800981516 C False Initial_Flock 2021-08-26

Bonus Convert Dataframes to DuckDB database for integration with Metabse¶

In [188]:
#Connect jupysql to DuckDB using a SQLAlchemy-style connection string. Either connect to an in memory DuckDB, or a file backed db.

%sql duckdb:///Coop/coop_farm_db.duckdb
In [189]:
### Convert all the necessary Dataframes to DuckDB database for integration with Metabse
# create the table "my_table" from the DataFrame "my_df"
%sql CREATE TABLE chicken_name_tags as select * from name_tags_df ;
Out[189]:
Count
0 1052
In [190]:
# insert into the table "chicken_name_tags" from the DataFrame "name_tags_df"
#%sql INSERT INTO chicken_name_tags SELECT * FROM name_tags_df;
In [191]:
#Execute a Select query to see all the table data in the chicken_name_tags table
#%sql select * from chicken_name_tags;
In [192]:
#Show Tables in the DucDB Database
%sql show tables;
Out[192]:
name
0 chicken_name_tags
In [193]:
name_tags_full_df.shape
Out[193]:
(1052, 36)
In [194]:
#Investigate and import the Dataframes in to DuckDB Tables.
In [195]:
#List of Dataframes to ingest into DuckDB Tables
#Songs table dataframe - 
dataframes = ['songs_df', 'egg_only_df', 'chickens_total_df','int_tags_df', 'barb_df', 'liz_df', 'mary_df', 'pat_df', 'name_tags_full_df']
#name_tags_df - Already Ingested
In [196]:
dataframes
Out[196]:
['songs_df',
 'egg_only_df',
 'chickens_total_df',
 'int_tags_df',
 'barb_df',
 'liz_df',
 'mary_df',
 'pat_df',
 'name_tags_full_df']
In [197]:
#Loop and Create Table, import and convert to Parquet in Duckdb.duckdb

for dataframe in dataframes:
    print("creating table:", dataframe)
    %sql CREATE TABLE {dataframe} as select * from {dataframe} ;
   # %sql INSERT INTO {dataframe} SELECT * FROM {dataframe};
    
    #%sql COPY {table} TO '{table}.parquet' (FORMAT PARQUET);
creating table: songs_df
creating table: egg_only_df
creating table: chickens_total_df
creating table: int_tags_df
creating table: barb_df
creating table: liz_df
creating table: mary_df
creating table: pat_df
creating table: name_tags_full_df
In [198]:
%sql select * from chicken_name_tags;
Out[198]:
Name Favorite_Song Mother Father GrandMother GrandFather I_LOC Mother_I_LOC Father_I_LOC GrandMother_I_LOC GrandFather_I_LOC Cousin
0 SHEMIKA We Will Rock You SONG Easton Mary Ralph C B C B B Keshawn
1 LOMA Mr. Tinkertrain SONG Muhammad Mary Ralph A B B B B Marc
2 Dominik Lodi SONG Deonte Mary Ralph A B C B B MAGDALEN
3 Brad Helter Skelter SONG Kamron Mary Ralph B B B B B Conner
4 Willie South California Purples SONG Kamron Mary Ralph C B B B B Emanuel
... ... ... ... ... ... ... ... ... ... ... ... ...
1047 Terrence What Do You Do For Money Honey AKILAH Warren Mary Ralph A C C C C Irvin
1048 SANORA Breakdown AKILAH Landon Mary Ralph B C A C C Kieran
1049 Sincere Peg AKILAH Matthew Mary Ralph C C B C C Dillon
1050 ADELE The Rain Song AKILAH Karson Mary Ralph B C A C C Marc
1051 DORTHY Down On the Corner AKILAH Kellen Mary Ralph C C B C C Arturo

1052 rows × 12 columns

In [199]:
#After testing - discovered Metabase doesn't currently support the latest version of DuckDB so we export the database as a set
#of Parquet files which allow us to use in the Metabase DuckDB connection as files.d
%sql EXPORT DATABASE 'Coop' (FORMAT PARQUET);
Out[199]:
In [200]:
%sql select count(*) from name_tags_full_df;
Out[200]:
count_star()
0 1052
In [ ]: